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ABSTRACT 

Daily deal sites have become the latest Internet sensation, 
providing discounted offers to customers for restaurants, tick- 
eted events, services, and other items. We begin by under- 
taking a study of the economics of daily deals on the web, 
based on a dataset we compiled by monitoring Groupon and 
LivingSocial sales in 20 large cities over several months. We 
use this dataset to characterize deal purchases; glean insights 
about operational strategies of these firms; and evaluate cus- 
tomers' sensitivity to factors such as price, deal schedul- 
ing, and limited inventory. We then marry our daily deals 
dataset with additional datasets we compiled from Facebook 
and Yelp users to study the interplay between social net- 
works and daily deal sites. First, by studying user activity 
on Facebook while a deal is running, we provide evidence 
that daily deal sites benefit from significant word-of-mouth 
effects during sales events, consistent with results predicted 
by cascade models. Second, we consider the effects of daily 
deals on the longer-term reputation of merchants, based on 
their Yelp reviews before and after they run a daily deal. 
Our analysis shows that while the number of reviews in- 
creases significantly due to daily deals, average rating scores 
from reviewers who mention daily deals are 10% lower than 
scores of their peers on average. 

1. INTRODUCTION 

Groupon and LivingSocial are websites offering various 
deals-of-the-day, with localized deals for major geographic 
markets. Groupon in particular has been one of the fastest 
growing Internet sales businesses in history, with tens of 
millions of registered users and 2011 sales expected to exceed 
1 billion dollars. 

We briefly describe how daily deal sites work; additional 
details relevant to our measurement methodology will be 
given subsequently. In each geographic market, or city, there 
are one or more deals of the day. Generally, one deal in 
each market is the featured deal of the day, and receives 
the prominent position on the primary webpage targeting 
that market. The deal provides a coupon for some product 
or service at a substantial discount (generally 40-60%) to 
the list price. Deals may be available for one or more days. 
We use the term size of a deal to represent the number of 
coupons sold, and the term revenue of a deal to represent 
the number of coupons multiplied by the price per coupon. 
Groupon retains approximately half the revenue from the 
discounted coupons 10 , and provides the rest to the seller, 
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as does LivingSocial. Deals each have a minimum threshold 
size that must be reached for the deal to take hold, and 
sellers may also set a maximum threshold size to limit the 
number of coupons sold. 

Daily deal sites represent a change from recent Internet 
advertising trends. While large-scale e-mail distributions 
for sale offers are commonplace (generally in the form of 
spam) and coupon sites have long existed on the Internet, 
Groupon and LivingSocial have achieved notable success 
with their emphasis on higher quality localized deals, as well 
as their marketing savvy both with respect to buyers and 
sellers (merchants). This paper represents an attempt to 
gain insight into the success of this business model, using a 
combination of data analysis and modeling. 

The contributions of the paper are as follows: 

• We compile and analyze datasets we gathered moni- 
toring Groupon over a period of six months and Liv- 
ingSocial over a period of three months in 20 large US 
markets. Our datasets will be made publicly available 
(on publication of this paper). 

• We consider how the price elasticity of demand, as well 
as what we call "soft incentives", affect the size and 
revenue of Groupon and LivingSocial deals empirically. 
Soft incentives include deal aspects other than price, 
such as whether a deal is featured and what days of 
the week it is available. 

• We study the predictability of the size of Groupon 
deals, based on deal parameters and on temporal progress. 
We show that deal sizes can be predicted with moder- 
ate accuracy based on a small number of parameters, 
and with substantially better accuracy shortly after a 
deal goes live. 

• We examine dependencies between the spread of Groupon 
deals and social networks by cross-referencing our Groupon 
dataset with Facebook data tracking the frequency 
with which users "like" Groupon deals. We offer evi- 
dence that propagation of Groupon deals is consistent 
with predictions of social spreading made by cascade 
models. 

• We examine the change in reputation of merchants 
based on their Yelp reviews before and after they run 
a Groupon deal. We find that reviewers mentioning 
daily deals are significantly more negative than their 
peers on average, and the volume of their reviews ma- 
terially lowers Yelp scores in the months after a daily 
deal offering. 

We note that we presented preliminary findings based on 
a single month of Groupon data that focused predominantly 
on the issue of soft incentives in a technical report [3J. The 
current paper enriches that study in several ways, both in 



its consideration of LivingSocial as a comparison point, and 
especially in our use of social network data sources, such as 
Facebook and Yelp, to study deal sites. Indeed, we believe 
this use of multiple disparate data sources, while not novel 
as a research methodology, appears original in this context 
of gaining insight into deal sites. 

Before continuing, we acknowledge that a reasonable ques- 
tion is why we gathered data ourselves, instead of asking 
Groupon for data; such data (if provided) would likely be 
more accurate and possibly more comprehensive. We of- 
fer several justifications. First, by gathering our own data, 
we can make it public, for others to use and to verify our 
results. Second, by relying on a deals site as a source for 
data, we would be limited to data they were willing to pro- 
vide, as opposed to data we thought we needed (and was 
publicly available). Gathering our own data also motivated 
us to gather and compare data from multiple sources. Fi- 
nally, due to fortuitous timing, Groupon's recent S-l filing 
|10| allowed us to validate several aggregate measures of the 
datasets we collected. 

Related Work on Daily Deals: To date, there has 
been little previous work examining Groupon and LivingSo- 
cial specifically. Edelman et al. consider the benefits and 
drawbacks of using Groupon from the side of the merchant, 
modeling whether the advertising and price discrimination 
effects can make such discounts profitable [9|. Dholakia polls 
businesses to determine their experience of providing a deal 
with Groupon 18], and Arabshahi examines their business 
model [2J. Several works have studied other online group 
buying schemes that arose before Groupon, and that utilize 
substantially different dynamic pricing schemes [l 12 . Ye 
et al. recently provide a stochastic "tipping point" model 
for sales from daily deal sites that incorporates social net- 
work effects [5T] . They provide supporting evidence for their 
model using a Groupon data set they collected that is sim- 
ilar to, but less comprehensive, than ours, but they do not 
measure social network activity. 

2. THE DAILY DEALS LANDSCAPE 

In this section, we describe the current landscape of daily 
deal sites exemplified by Groupon and LivingSocial. We 
start by describing the measurement methodology we em- 
ployed to collect longitudinal data from these sites, and 
provide additional background on how these sites operate. 
We then describe basic insights that can be gleaned directly 
from our datasets, including revenue and sales broken out 
by week, by deal, by geographic location, and by deal type. 
Moving on, we observe that given an offering, daily deal 
sites can optimize the performance of the offering around 
various parameters, most obviously price, but also day-of- 
week, duration, etc. We explore these through the lens of 
our datasets. 

2.1 Measurement Methodology 

We collected longitudinal data from the top two group 
deal sites, Groupon and LivingSocial, as well as from Face- 
book and Yelp. Our datasets are complex and we describe 
them in detail below. 

2.1.1 Deal data 

We collected data from Groupon between January 3rd and 
July 3rd, 2011. We monitored - to the best of our knowl- 
edge - all deals offered in 20 different cities during this pe- 



riod. Our criteria for city selection were population and ge- 
ographic distribution. Specifically, our list of cities includes: 
Atlanta, Boston, Chicago, Dallas, Detroit, Houston, Las Ve- 
gas, Los Angeles, Miami, New Orleans, New York, Orlando, 
Philadelphia, San Diego, San Francisco, San Jose, Seattle, 
Tallahassee, Vancouver, and Washington DC. In total, our 
data set contains statistics for 16,692 deals. 

Each Groupon deal is associated with a set of features: 
the deal description, the retail and discounted prices, the 
start and end dates, the threshold number of sales required 
for the deal to be activated, the number of coupons sold, 
whether the deal was available in limited quantities, and if 
it sold out. Each deal is also associated with a category such 
as "Restaurants", "Nightlife", or "Automotive". From these 
basic features we compute further quantities of interest such 
as the revenue derived by each deal, the deal duration, and 
the percentage discount. 

With each Groupon deal, we collected intraday time-series 
data which monitors two time- varying parameters: cumula- 
tive sales, and whether or not a given deal is currently fea- 
tured. To compile these time-series, we monitored each deal 
in roughly ten-minute intervals and downloaded the value of 
the sales counter. Occasionally some of our requests failed 
and therefore some gaps are present in our time-series data, 
but this does not materially affect our conclusions. The 
second parameter we monitored was whether a deal was fea- 
tured or not, with featured deals being those deals that are 
presented in the subject line of daily subscriber e-mails while 
being given prominent presentation in the associated city's 
webpage. For example, visiting groupon.com/boston, one 
notices that a single deal occupies a significant proportion 
of the screen real-estate, while the rest of the deals which 
are concurrently active are summarized in a smaller sidebar. 

Although Groupon has a public APQ through which one 
can obtain some basic deal information, we decided also to 
monitor the Groupon website directly. Our primary ratio- 
nale was that certain deal features, such as whether a link to 
reviews for the merchant offering the deal was present, were 
not available through the Groupon API. We used the API 
to obtain a category for each deal and to validate the sales 
data we collected. Observed discrepancies were infrequent 
and small: we used the API-collected data as the ground 
truth in these cases. We did not use the API to collect 
time-series data. 

We collected data from LivingSocial between March 21st 
and July 3rd, 2011 for the same set of 20 cities. In total, 
our LivingSocial dataset contains 2,609 deals. LivingSocial 
deals differ from their Groupon counterparts in that they 
have no tipping point, and in that they do not explicitly 
indicate whether they are available in limited quantities (al- 
though they do sell out occasionally). LivingSocial runs two 
types of deals: one featured deal per day, and a secondary 
"Family Edition" deal, which offers family-friendly activities, 
and receives less prominent placement on the LivingSocial 
website. For LivingSocial deals we only collected data on 
their outcomes; we did not collect time-series data. 

2.7.2 Facebook data 

Both Groupon and LivingSocial display a Facebook Like 
button for each deal, where the Like button is associated 
with a counter representing the number of Facebook users 
who have clicked the button to express their positive feed- 

: http : //www. groupon. com/pages/api 





(a) Groupon (b) LivingSocial 

Figure 2: Revenue and coupons sold per deal week-over- week. 



back. We refer to the value of the counter as the number 
of likes a deal has received, and we collected this value for 
each Groupon and LivingSocial deal in our dataset. 

As a technical aside, we mention that Groupon and Liv- 
ingSocial have different implementations of the Facebook 
Like button that necessitated our collecting data from them 
in different ways. Within each deal page, Groupon embeds 
code that dynamically renders the Facebook Like button. It 
does so by sending Facebook a request that contains a unique 
identifier associated with the corresponding deal page. We 
extracted the unique identifier from Groupon deal pages and 
directly contacted Facebook to obtain the number of likes for 
every deal. LivingSocial instead hard-codes the Like button 
and its associated counter within each page. As we could 
not obtain the identifier associated with each LivingSocial 
deal, we could not query Facebook to independently obtain 
the number of likes, and thus we collected the hard-coded 
number from LivingSocial deal pages. 

2.1.3 Yelp data 

Groupon occasionally displays reviews for the merchant 
offering the deal in the form of a star-rating, as well as se- 
lected reviewer comments. The reviews are sourced from 
major review sites such as Yelp, Citysearch, and Trip Advi- 
sor. For Groupon deals that were linked to Yelp reviews, 
we collected the individual reviewer ratings and comments 
left by customers on Yelp. We collected this dataset dur- 
ing the first week of September 2011. In total, our dataset 



contains 56,048 reviews, for 2,332 merchants who ran 2,496 
deals on Groupon during our monitoring period. Yelp has 
implemented one measure to discourage the automated col- 
lection of reviews which directly affected our data collec- 
tion: it hides a set of user reviews for each merchant. To 
see the hidden reviews, one has to solve a CAPTCHA. We 
did not attempt to circumvent CAPTCHAs, and we do not 
know whether the hidden reviews are randomly selected, or 
are selected by other criteria. Since Yelp reports the total 
number of reviews available per merchant, we ascertained 
that approximately 23% of all reviews for merchants in our 
dataset were hidden from our collection. 

2.2 Operational Insights 

Figure [l] serves as an overview of insights we are able to 
gain using our dataset. It displays the weekly revenue as well 
as the weekly sales of coupons across all 20 cities we moni- 
tored for Groupon and LivingSocial, respectively. Notably, 
while both Groupon and LivingSocial are widely regarded as 
companies enjoying extremely rapid growth, our first take- 
away from these plots is that sales and revenue in these 20 
established markets are relatively flat across the time pe- 
riod. We conjecture that much of the reported growth is in 
newer markets. 

By happenstance, Groupon's recent S-l filing MQ\ pro- 
vided financial information that allowed us to validate some 
of the aggregate revenue data that we collected indepen- 
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Figure 3: Deal size by price (in logarithmic scale.) Black dots 
trend line fitted using OLS regression for all data is also shown. 

dently. For example, the filing states that in the three 
months ending March 31, 2011, Groupon had sold 950,689 
deals in Chicago earning $21. 5M in revenues. Our dataset 
accounts for 967,244 deals sold and $21. 3M in revenues. For 
the same period in Boston, the filing reports 388,178 deals 
sold for total revenues of $9.3M compared to 362,823 deals 
and $8.7M in our dataset. In both cases, our observations 
closely match what the company reported. The small differ- 
ences may have arisen for several reasons: we did not moni- 
tor all revenue generating activities (such as direct backchan- 
nels provided to merchants); some deals were offered at mul- 
tiple prices but we only monitored the main one; our ac- 
counting methods might differ from Groupon's; our inabil- 
ity to account for refunds; and our monitoring infrastructure 
may have overlooked some deals. 

Outliers in our plots are not due to especially strong per- 
formance in local markets, but instead seem to be generated 
by large national deals. The three most significant outliers 
for both Groupon and LivingSocial each correspond to a 
large national offer. Representative deals noted in Figure [I] 
include $10 for $20 in Barnes & Noble merchandise during 
the week of 1/31 for Groupon and 7 nights in the Caribbean 
during the week of 6/13 for LivingSocial. 

The Groupon plot is annotated with two significant events 
from the February time period that mark an unusual de- 
cline in revenues. On February 6th, Groupon's Super Bowl 
commercials aired, and were found offensive by many peo- 
ple; Groupon apologized on their official blog and pulled the 
controversial ad campaign [11]. Subsequently, from Febru- 
ary 9th to February 11th, in advance of Valentine's Day, 
Groupon offered a nationwide "$20 for $40 Worth of Flowers" 
coupon in partnership with FTD Flowers. Shortly there- 
after, customers started realizing that when visiting the FTD 
Flowers website via Groupon they were shown much higher 
prices for the same products compared to customers who vis- 
ited the FTD site directly. Furthermore, coupon purchasers 
were only allowed to use their coupons on the more expen- 
sive version of the product. This deal was widely perceived 
as a "bait and switch" scheme; Groupon offered disgruntled 
customers refunds 2 ! [18] . The recovery that we observe in 



2 Since we cannot account for these refunds, the reported 
sales and revenues of the week starting on February 7 are 
an overestimate. 
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highlight deals in LA; grey dots are used for other cities. A 

the revenue indicates that this shock may only have had a 
short-term effect on Groupon's business. Similar short-term 
effects have been observed in the time-series of YouTube 
video views after exogenous shocks m. 

A different phenomenon is captured in Figure [2] where 
we plot week- over- week sales and revenue on a per-deal ba- 
sis. For Groupon, sales and revenue per deal in the 20 cities 
we monitored peaked in February 2011 and have trended 
steadily downward since. For LivingSocial, we observe much 
greater variability, obscuring any underlying trend, but per- 
deal sales and revenue appear flat or declining. While these 
trends could potentially point to underlying fragility in the 
daily deals business model (possibly the best revenue-producing 
merchants in a geographical area are becoming exhausted), 
more benign explanations exist. The trend could reflect a 
change in operational strategy to broaden the base of fea- 
tured deals available to subscribers at any given time, for 
example, to provide better personalization of deals to sub- 
scribers. However, revenue growth in established markets 
appears to be a potential challenge facing daily deal sites. 

2.3 Pricing and Other Incentives 

We now consider a range of factors that influence the num- 
ber of coupons sold and that daily deal sites can control. Our 
primary focus is on Groupon. As daily deal sites have built 
their business around deeply discounted offers, one might ex- 
pect that the discounted price associated with a deal would 
be the primary purchase incentive. Indeed, in the log-log 
scatterplots of deal sizes vs. deal prices depicted in Figure[3] 
the trend lines fitted using ordinary least-squares (OLS) re- 
gression indicate that the logarithm of price and the loga- 
rithm of sales are roughly linearly related. While there is a 
large amount of variance within individual price points, by 
controlling for other features, such as restricting attention 
to deals in Los Angeles (black points), the trend becomes 
clearer. 

A closely related deal feature is the magnitude of the dis- 
count, which Groupon displays prominently in advertising 
a deal. While we conjecture that customers are sensitive to 
this quantity, most discounts presented fall in a relatively 
narrow range (three-quarters of all Groupon deals are dis- 
counted by 40 to 60%). This has the effect of making the 
list price highly correlated (0.90 correlation coefficient) with 
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Figure 4: Groupon deal sizes (top row) and deal revenues (bottom row) are shown on the a;— axis. On the y— axis deals are 
broken down by various deal features. The outer whiskers mark the 5% and 95% quantiles of the deal size, the sides of each 
grey rectangle the 1st and 3rd quartiles, the solid bar the median, and the solid dot the mean. 
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Mean revenue 


$12,576 


$18,375 


$20,010 


$20,189 


Number of deals 


5,464 


5,745 


3,877 


1,606 



Table 1: Groupon deals by duration. 



the discount price and therefore the discount is a weak dis- 
tinguishing feature once price has been taken into account. 

Deal sites can also control the length of time a deal runs. 
As shown in Table [l] while deal size varies minimally with 
duration, revenue increases, suggesting perhaps that Groupon 
is attempting to hit sales, instead of revenue, targets. Ex- 
pensive deals may generally sell less quickly, and must be 
allowed more time to achieve the same sales goal. 

At any point in time, and for each geographic market, 
one deal among the set of all available deals is featured by 
Groupon. Featured deals receive prominent placement both 
on the Groupon website and in the daily email that cus- 
tomers receive. In our dataset, 22% of deals were featured. 
The impact of featured placement is significant: the mean 
sales and revenue for featured deals are well in excess of twice 
the corresponding quantities for other deals. The effects of 
featuring a deal are summarized in the Table[2] However, we 
cannot assume that these outcomes are entirely causal effects 
from featuring a deal, as Groupon naturally has an incen- 
tive to feature those deals that will drive the most sales. We 
investigated whether certain categories of deals as a whole 
were featured more frequently. Table [3] breaks down cat- 
egories with at least 100 deals by the percentage of deals 
within each that was featured. 

Another distinguishing characteristic is inventory size: some 
deals are available only in limited quantities. Groupon lets 





Featured 


Non-featured 


Mean sales 


1,443 


475 


Mean revenue 


$34,181 


$12,241 


Number of deals 


3,644 


13,048 



Table 2: Groupon deals by placement. 

its customers know which deals are limited, but does not dis- 
play the number of available units (while some competitors 
of Groupon display this prominently). Approximately 31% 
of all deals in our dataset are available in limited numbers, 
with wide variation across categories. Approximately 50% 
of all "Travel" deals and 39% of all "Arts and Entertainment" 
deals were available in limited quantities, as one might ex- 
pect for these types of deals. Surprisingly, only about 18% 
of the limited deals in our dataset sold out. It is not that 
these deals are intrinsically less attractive: limited deals out- 
performed unlimited deals on average by 11% more coupon 
sales and 27% more revenue. It is possible that merchants 
(or Groupon) artificially limit deals as a strategy to exert 
pressure to customers, making them more likely to purchase 
on the spur of the moment. 

Groupon also has a choice as to the days of the week that 
it schedules each offer. Figure |4e] breaks down deals by the 
day of the week on which they began. Even though the dif- 
ferences are not striking, it appears that deals starting in 
the beginning of the week produce less revenue, and deals 
starting on Friday produce the most. One possible explana- 
tion is that on Fridays, Groupon starts more multi-day deals 
that span the weekend. For example, 45% of three-day deals 
start on a Friday. However, alternative explanations, such 
as that the best deals are run on Friday or that people are 
more likely to buy on Friday, may also apply. 
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Home Services 


13% 


370 


Nightlife 


25% 


288 


Other 


24% 


327 


Prof. Services 


18% 


673 


Restaurants 


24% 


2,794 


Shopping 


18% 


2,176 


Travel 


35% 


404 



Table 3: A breakdown of featured deals by category. 



Deals are said to be "on" when they surpass a sales thresh- 
old defined by Groupon, possibly in conjunction with the 
merchant. That is, for customers to get a discount, a mini- 
mum number of them must commit to the deal. In theory, 
this could drive a group dynamic whereby customers encour- 
age their friends to buy a coupon to reach the threshold. 
However, current deal thresholds are very low. For exam- 
ple, in our dataset, the mean threshold to total sales ratio 
is approximately 19%. Deals that surpassed their thresh- 
old did so on average just after 8am, and only 2% of deals 
with thresholds failed to reach them. Like the relationship 
between price to deal size, the relationship between thresh- 
old and deal size also appears to be linear when plotted in 
logarithmic scale. 

Deals can also be classified by two features that Groupon 
does not have direct control over: their geographic market, 
and their category. Figure [4a] shows that while overall deal 
sizes vary considerably across cities, and not always in pro- 
portion to city population. This is in part due to localiza- 
tion: Long Island has a separate Groupon deal stream from 
New York; similarly Los Angeles is split into multiple sub- 
areas. As for deal categories, Figure |4b"| suggests that deals 
that are the most lucrative in terms of revenue for Groupon 
are not the most popular with their customers. For ex- 
ample, while "Travel" deals produce fewer sales than most 
other categories on average, they produce the most revenue; 
conversely "Restaurants" deals are the most popular, but 
produce much less revenue. 

Finally, it is not especially surprising, but still notable 
that we observe heavy-tailed behavior throughout Figure [4] 
with the mean deal producing many more sales and much 
more revenue than the median in essentially all cases. 



3. MODELING DEAL OUTCOMES 

Having considered various deal features and their individ- 
ual correlation with deal size and revenue individually, we 
now consider these characteristics collectively using regres- 
sion. Our goal is both to better quantify the dependence 
of deal outcomes on the various features, and to determine 
if such models are sufficiently accurate to predict the out- 
comes of future deals. The model we use for Groupon deals, 



noting that all logarithms are base e, is: 

log q = Po + Pi logp + /3 2 log t + /3 3 d + Pif + p 5 l 

+ p 6 w + p 7 c + p aS (1) 

where q stands for the deal size, p for the coupon price, t for 
the threshold, d for whether the deal is run for multiple days 
or not, / for whether the deal is featured or not, and I for 
whether the deal inventory is limited or not. The values of 
p and t are centered to their corresponding medians (25 in 
both cases). This allows for a more intuitive interpretation 
of the regression's intercept but does not otherwise affect our 
results. The parameters w, c and g are dummy-coded vec- 
tors representing the starting day of the week, category, and 
city relative to notional reference levels; their corresponding 
coefficients are also vectors. Dummy-coding refers to using 
binary vectors to encode categorical variables, where a vari- 
able that can take on k distinct values is encoded using a 
binary vector of length k — 1 where at most one entry is set 
to one. We also fitted a similar log-log model to LivingSocial 
deals with similar results to those we report below. 

The exact form the model takes upon is motivated by the 
observations of Section |2.3| logp and logt are well mod- 
eled as having a linear relationship to logg, while the rest 
of the variables in our model are either boolean or categor- 
ical in nature. Given the high correlation of the list price 
to the discounted price, we have chosen to exclude it from 
the model to avoid introducing multicollinearity. Also, since 
most multi-day deals last for two days, and there is little 
variance in the number of sales among multi-day deals, we 
have chosen (after experimentation) to encode duration as 
a boolean feature. 

We fitted the model using ordinary least squares (OLS) 
regression. The parameter estimates, their standard errors, 
and their significance levels are shown in Table [7a] in the 
appendix. A similar model of LivingSocial deals appears 
in Table |7b| We focus the presentation of our results on 
Groupon. 

The intercept of the model is the unconditional expected 
mean of the logarithm of the size of a deal. For example, the 
expected mean of the log-size of a deal in the "Other" cate- 
gory, priced at $25, with a threshold of 25, not featured, with 
unlimited inventory, starting on Monday, and running for a 
single day in Atlanta, is 5.19. Equivalently, the expected 
geometric mean of the size of the same deal is e 5 ' 19 « 179. 

The coefficient of logp is particularly interesting because 
its value is the point-price elasticity of demand for coupons. 
To see this, recall that the point-elasticity r\ v is defined as: 

Vp = (dq/q)/(dp/p) = d log g/<9 logp = p 1 . (2) 

For deal size, the point-price elasticity given by our regres- 
sion model is —0.48. Intuitively, this means that for a 1% 
increase in price, we expect a (roughljj^]) 0.48% decrease in 
demand. Since r\ v > — 1 the demand is said to be inelas- 
tic. This matches our intuition: coupons already represent 
heavy discounts and as such, changes in price should have a 
relatively less significant impact upon demand. 

The coefficients of non-log-transformed variables represent 
differences in the expected means of log-sales, and their ex- 

3 The point-price elasticity relates infinitesimal percentage 
changes in price to percentage changes in demand. For any 
small enough percentage change in price the estimated per- 
centage change in sales will be reasonably accurate. 
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Figure 5: Actual vs. predicted sales for a test set of Groupon 
deals, in log-log scale for our plain regression model, as well 
as the the model incorporating early morning sales. 

ponentiated values represent multiplicative increases (or de- 
creases) in the expected geometric mean of sales. For exam- 
ple, the expected ratio of the geometric means of sales for 
multi-day deals to single-day deals is e ' 22 w 1.25. A sim- 
ple interpretation is that by running a deal for more than a 
day, we expect a 25% increase in sales. The effect of featur- 
ing a deal is, as anticipated, far greater: featured deals are 
expected to perform 141% better than their non-featured 
counterparts. Limiting a deal, in the presence of all the 
other parameters in our model, does not have a statistically 
significant effect. 

Overall, the F-statistic (345) and the p- value (< 2- 10" 16 ) 
of the model indicate that we can reject the null hypothe- 
sis (that there is no relation between sales and any of the 
explanatory variables in our model) with high confidence. 
However, the R 2 (0.49) value of the model suggests mod- 
erate predictive power. We tested this against a test set 
of 1,522 deals that ran in the same 20 cities between Au- 
gust 18th and 31st. Figure [5] shows the number of sales our 
model predicts against their actual sales. As anticipated by 
the R 2 value of our model, our predictions are generally well 
correlated with actual sales, with noticeably large errors for 
certain individual deals. 

Not surprisingly, our model consistently overestimates weak 
deals with a very small number of actual sales, and tends to 
underestimate the strongest deals, indicating that there are 
additional factors at work not captured by our model. One 
such factor we study in Section |4.1| is how social networks 
can influence deal sales. Another possible influence on pur- 
chasing decisions could be merchant reputations, which we 
consider below, and in the other direction (how Groupon 
offers affect reputation) in Section |4.2| One clear major in- 
fluence is that some deals are national in scope, and our 
largest underestimates appear to be for these deals. For ex- 
ample, as shown in in Figure [5] some of the larger errors 
in underestimating deal size correspond to a national deal 
for Quiznos sandwiches. Incorporating these additional re- 
finements to improve the predictive power of our models is 
future work. 

We consider two variations on our basic model: 

Early-morning sales: We have incorporated first-day 
sales at 7am from our time-series dataset into our basic 
model. Our reasoning is that a deal site may want to obtain 
an improved prediction of deal performance early in the day 
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Figure 6: Average cumulative first day sales across all 
Groupon deals in our dataset. 



Figure 7: The number of deals starting on any given day of 
the week, broken down by their eventual duration. 

(and perhaps adjust accordingly). We present data for 7am 
for 2 reasons. First, Groupon generally sends email between 
4am and 6am, and we want to see effects after the e-mail 
is sent. Second, as shown in Figure |6j on average Groupon 
deals have sold less than 7% of their eventual first day sales 
by 7am, so the time is suitable for an early prediction. Un- 
surprisingly, the predictive power of our model is greatly 
improved with this feature, as indicated by an increased R 2 
of 0.81, and as shown in Figure[5] Detailed regression results 
are shown in Table [8a| in the Appendix. 

Yelp merchant reviews: We also attempted to add the 
Yelp rating of merchants prior to their Groupon deal as a 
parameter to our model for the deals for which we had a 
rating available (2,522 of them). This failed to significantly 
improve our model. One explanation is that the average 
Yelp star-rating displayed for each business may not be es- 
pecially significant to Groupon buyers; instead they might 
prefer scanning individual reviews and ratings on Yelp. An- 
other is that there is only moderate variance in the mer- 
chants Groupon selects to give Yelp ratings for; we found 
68% of the reviewed merchants had an average rating rang- 
ing between 3.5 and 4.5 stars, and 95% between 3 and 5 
stars. Detailed regression results are shown in Table |8b| in 
the Appendix. As a consequence of limiting our dataset to 
only on those merchants for whom we could acquire a Yelp 
rating we cannot obtain coefficient estimates for every deal 
category. For example, as we could not obtain any reviews 
for deals in the "Real Estate" category we cannot compute 
the corresponding coefficient. 

3.1 Deal Scheduling 

One further optimization available to daily deal sites is 
managing how deals are scheduled. Our data shows that 
Groupon is already managing deal schedules in non-trivial 
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Table 4: The observed/expected number of times one featured category follows another. 



ways. For example, there are far fewer deals offered on week- 
ends, but more multi-day deals run over weekends. In an- 
other example, in Figure [7] we plot the duration of deals 
by the day of the week that they started on. The subplots 
show the counts of one-day, two-day, and three-day deals by 
their starting day of the week. We observe that Groupon 
initiates most one-day deals Monday through Thursday and 
most three-day deals on Fridays and weekends. 

We also find evidence that deals are scheduled according 
to their category; specifically, that Groupon avoids placing 
featured deals of the same category back-to-back over con- 
secutive days in the same city. This is a natural strategy to 
maintain user interest, related to issues of ad fatigue (or ad- 
vertising wear out) studied in other contexts |4j [6]. Table [4] 
summarizes an experiment we conducted to investigate this 
phenomenon. For each pair of categories we show the ob- 
served number of occurrences of that event followed by the 
expected number of occurrences if deals had been scheduled 
independently at random. Deviations from random are gen- 
erally small, except for certain categories on the diagonal of 
this matrix, which indicates that Groupon avoids featuring 
the same category back-to-back. 

We note that the problem of how deal sites should best 
schedule deals to optimize revenue or sales remains an inter- 
esting direction for future work. 

4. THE IMPACT OF SOCIAL INFLUENCE 

We now move from studying solely the deal data to con- 
sidering the interplay between social networks and daily deal 
sites, specifically quantifying the impact of Facebook users 
while a deal is running, and of Yelp reviewers as they provide 
feedback about merchants. 

We focus on two distinct issues. First, while many indi- 
viduals may be enticed to purchase a Groupon based just 
on the daily e-mail, we hypothesize that many others are 
activated based entirely or in part by social factors. One 
observable quantity we leverage to investigate this hypoth- 
esis is the number of individuals who "like" a given deal 
on Facebook. Our measurement study quantifies the corre- 
lation between two sets of time-series: Groupons sold and 
Facebook likes. Our evidence indicates clear network effects 
from social influence, and is consistent with basic cascade 
models of activation from the theoretical literature. 

Second, we consider another social construct: the influ- 
ence of a Groupon sale on merchants' reputations as de- 
termined by online reviews. Specifically, we study Yelp re- 
views of restaurants before and after a Groupon offer. Not 
surprisingly, we find a surge in reviews from new customers 
subsequent to the offer. But we also find that reviewers men- 



tioning "Groupons" and "coupons" provide strikingly lower 
rating scores than those that do not, and these reviews re- 
duce Yelp scores over time in our dataset. 

4.1 Do Deals Propagate via Facebook Likes? 

Groupon deals can be shared with friends by text, e-mail, 
Facebook, Twitter, and other means. Facebook offers a 
readily observable measure, which we use as a proxy for more 
general social sharing here: Groupon has added a Facebook 
Like button on each deal's web page. Should a Facebook 
user decide that he likes a Groupon deal (no purchase nec- 
essary) and clicks the Like button, a like counter shown next 
to the button is incremented, and a short message to that 
effect is distributed to the user's Facebook friends by means 
of their news feeds. These messages may then propagate 
recursively through Facebook, potentially creating a sales- 
enhancing word-of-mouth effect. 

Here we examine whether Facebook likes correlate with 
the final deal size, and then ask whether the data appears 
consistent with current models for the effects of social net- 
works on buying decisions. We note recent work also sug- 
gests social spreading as a significant determinant of final 
deal size, giving a model where once the tipping point of 
a deal is reached, it grows due to a stochastic process that 
models social spreading [21]. Their model appears orthogo- 
nal to our analysis. 

We emphasize that we must be careful not to conflate 
correlation and causation; it is not clear that likes inspire 
purchases, even if the number of likes and deal sizes are cor- 
related. As a strawman, consider a model where each pur- 
chaser ignores prior likes, and pushes the Like button with 
probability p after deciding to purchase. In this setting, 
likes and purchases would be closely (linearly) correlated, 
even though likes would not promote purchases. Indeed, 
this provides an interesting null hypothesis: are likes sim- 
ply proportional to sales? Our data suggests that this is 
not the case: Figures [8a] and |8b| suggest a non-linear rela- 
tionship, where the deal size is roughly proportional to the 
number of likes raised to a power much less than 1. Also, 
deals with approximately the same number of likes can vary 
significantly in terms of final deal size. (A more detailed 
breakdown appears in Figure [9]) 

We can gain some further insight by adding likes to our 
regression model. Table [9] in the Appendix summarizes the 
results of regressing deal size on the various deal features 
and Facebook activity. Adding the logarithm of the number 
of likes as an additional variable on the right hand side of 
the model Equation [I] improves our R 2 statistic from 0.49 
to 0.63 for Groupon and from 0.38 to 0.67 for LivingSocial. 
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Figure 8: Deal size vs. number of Facebook likes (in logarithmic scale.) Black dots highlight deals in LA; grey dots are used 
for other cities. A trend line fitted using OLS regression is also shown. 
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Figure 9: Number of Facebook likes are depicted on the a;— axis. On the y— axis deals are broken down by various deal 
features. The outer whiskers mark the 5% and 95% quantiles of the deal size and revenue, the sides of each grey rectangle 
the 1st and 3rd quartiles, the solid bar the median, and the solid dot the mean. 



While this improvement is not directly useful for, for exam- 
ple, predicting deal size a priori, it demonstrates a strong 
correlation between the logarithms of likes and deal size. 

Social Spreading: We do not currently have (and have 
not sought) direct evidence of social spreading of deals via 
likes; this would require either detailed knowledge of the 
Facebook social network, or a detailed user study. Both are 
beyond the scope of this paper, but are interesting questions 
for future work. Here, however, we examine whether our 
data is consistent with theoretical models of social spreading 
from the literature (e.g., 13 16]). In such models, there 
is generally a seed set of users that initially recommend a 
product; then additional users activate and buy the product 
based on these recommendations. Here, we treat Facebook 
likes as recommendations for a specific deal. 

Two key features of cascade models are how the seed set is 
selected, and how inactive users are activated by neighbors. 
In our setting, we consider the seed set to be Groupon sub- 
scribers who are informed of the daily deal through Groupon's 
daily email and proceed to like the deal on Facebook. (Likely, 
those in the seed set also purchase the deal, but this is irrel- 
evant to the cascade dynamics we consider here.) Our data 
provides insight into the size of the seed set, but not into 
the how the seed set should be selected. Previous work [5[ 
19 20] has established the correlation between degree and 
activity in social networks. In [20] the authors demonstrate 



that the top 50% of Facebook users by degree are respon- 
sible for most social interactions. Kwak et al. [14] make a 
similar case for Twitter, where they demonstrate that users 
with more followers are more like to tweet. The potential 
implication, translated to our setting, is that Groupon cus- 
tomers who have more friends on Facebook are more likely 
to belong to the seed set. We test this by simulating cascade 
models with the seed set selected uniformly at random, and 
as the top-fc nodes by degree. 

With regard to activation, we consider two standard vari- 
ations: a user is activated with fixed probability p by each 
of his active neighbors, and each active user u activates user 
v with probability l/d v , where d v is the degree of user v. 
The first model activates high-degree nodes more frequently. 
These cases correspond to the Independent Cascade (IC) 
and Weighted Cascade (WC) models of Kempe et al. [13] . 

Finally, we model that each activated user purchases the 
deal with fixed probability q, which can be thought of as the 
conversion rate. Note that we separate the issues of activa- 
tion and purchase; in our setting, activation corresponds to 
noticing the deal, rather than purchasing it. 

We ran experiments based on these models on a network 
that has common characteristics with social network graphs, 
the arXiv High Energy Physics collaboration network from 
[15] . This network consists of 9,877 nodes and 51,971 edges. 
The results are shown in Figure |10| We plot the number 
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Figure 10: Sales as predicted by two diffusion models. 
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Figure 11: The average Yelp star-rating for merchants before 
and after their Groupon offers (line-chart), and the average 
number of reviews per merchant per month (bar-chart). 

of sales resulting by the cascade against the seed set size 
averaged over 100 trials per starting seed set size. We ran 
our experiments both by selecting a random seed set, and 
by selecting the top-fc nodes by degree. The high-degree 
heuristic resembles our empirical findings more closely. For 
the IC model we set p = 2%, and for both models q = 5%. 
We observe that in both cases, when using top-fc nodes as 
the seed set, we observe sublinear gains in the size of the 
cascade as the size of the seed set increases. These results 
match previous work such as [l3] , and give some insight into 
our empirical findings. 

While these results suggest that social spreading is a fea- 
ture of daily deals, current theoretical models are too sim- 
plistic to capture the array of features we have found to 
be important in determining deal size, with price being the 
most obviously relevant. It may also be useful to specifically 
consider models for daily deals in the context of cascading 
recommendations (e.g., [16]); this could also be helpful in 
explaining the large observed variance in deal sizes across 
deals with the same number of likes. 

4.2 Yelp Feedback on Groupon Merchants 

A key selling point of a daily deals site is the promise 
of beneficial long-term effects for merchants participating in 
a deal offering. Since discounted deals typically result in a 
net short-term loss to the merchant as customers redeem the 
coupons, a merchant is pitched on the expectation that some 
new customers, initially attracted by the deal, will become 
repeat customers, providing a long-term gain [9j. Participat- 
ing merchants should determine that these gains outweigh 



Avg. Rating Reviews 

Before period (12 months) 3.71 39,042 

After period (6 months) 3.59 17,006 



Table 5: Yelp reviews around a merchant's Groupon offer. 



the costs, which include providing discounts to their existing 
customer base. 

While we do not have data to directly evaluate the long- 
term financial impact for Groupon merchants, we consider 
a novel, alternative approach to concretely quantify the im- 
pact of Groupon deals on a merchant's reputation. Specifi- 
cally, we examine the extent to which a Groupon deal affects 
review scores at Yelp, a popular online review site. We view 
review scores as a useful proxy for both direct repeat busi- 
ness as well as for new business from word-of-mouth effects. 

Groupon often displays a Yelp rating for the merchant 
offering a specific deal. For each Groupon deal associated 
with a Yelp rating, we collected all the individual reviews 
posted on Yelp up through August 2011. Yelp reviews are 
comprised of a star-rating ranging from one to five stars, 
the review text, and the date the review was written. We 
associated each review with an offset, measured in months, 
from the earliest Groupon deal in our dataset for the corre- 
sponding merchant, i.e., if a Groupon offer occurred in June 
2011, a review for that merchant dated March 2011 would 
have an offset of -3. For each merchant, and for each integer 
offset (up through July 2011), we computed the merchant's 
average star-rating, thereby constructing a time-series of the 
star-rating value oriented around the first Groupon offer that 
we observed for a merchant. Figure [Tl] presents our find- 
ings. The line-chart displays the average star-rating across 
all Yelp-rated merchants in our dataset for each offset, with 
the i-axis depicting the offset from the time of the offer. 
The bar-chart depicts the average review volume over the 
same period, using the scale on the right-hand side of the 
y-axis, and measured in reviews per merchant per month. 
Lighter shading indicates total review volume, while darker 
shading indicates the volume of reviews that contain the 
keyword "Groupon". (Mentions of Groupon in reviews with 
negative offsets are mostly due to user reviews for merchants 
who ran an additional offer before we began collecting data. 
Excluding these merchants does not materially change our 
findings.) 

In looking at Figure |11| we first consider the behavior 
prior to Groupon offers. While month to month scores vary, 
they appear essentially flat. The average number of reviews 
slowly increases (likely as Yelp itself is growing). There are 
few Groupon mentions. We think of these qualitative be- 
haviors as our baseline. 

Subsequent to the Groupon deal, we see marked changes 
in behavior. First, on average, the number of user-contributed 
reviews increases significantly after a Groupon offer, and 
based on the proportion of Groupon mentions, the Groupon 
deal appears to be the proximate cause. We quantify this 
change using the monthly number of Yelp reviews as a proxy. 
Our methodology is as follows: let Y; be the average num- 
ber of reviews merchant i received per month in the three 
months before the Groupon offer. Let Zi be the number of 
reviews the month of the Groupon offer. We compute the 
percentage change in number of reviews for each merchant 



the month of the Groupon offer as (Zi — Yi)/Yt. (This re- 
quires Yi > 0.) Similarly, we can compute the percentage 
change for the month after the Groupon offer. We find that 
the average percentage increase in reviews across all mer- 
chants who had received at least one review in the 3 months 
prior to the Groupon offer is 44%. Meanwhile, the average 
month-over-month growth in number of reviews for deals 
prior to their Groupon offers is about 5%. Similarly, the 
average percentage increase in reviews the month after the 
Groupon offer is 84%. Roughly 20% of merchants in our 
dataset (461 out of 2,332) had received zero reviews in the 3 
months prior to the Groupon offer. Of these, 270 received at 
least one review within two months after the Groupon offer. 

Our second conclusion is that Yelp star ratings decline 
after a Groupon deal. To quantify the magnitude of the de- 
cline, we employed the following methodology: as our base- 
line, we computed the average of all reviews with a negative 
offset (before the Groupon offer). Similarly, we computed 
the average of all reviews with a positive offset (after the 
Groupon offer). We deemed reviews with offset zero as am- 
biguous, due to data collection granularity, and they were 
not considered. The results of the before and after com- 
parison are depicted in Table [5] The average drop in rat- 
ings is 0.12. This could affect any sorted order produced 
according to Yelp rankings significantly. Also, Yelp scores 
are reported and displayed according to discretized half-star 
increments. Thus, an average drop of 0.12 suggests a sig- 
nificant number of merchants may lose a half-star due to 
rounding. This could have a potentially important effect 
on a business; a recent study reports that for independent 
restaurants a one-star increase in Yelp rating leads to a 9% 
increase in revenue |TFj. However, the transitory nature of 
Groupon-driven reviews, in addition to complexities of mod- 
cling hidden factors like weighted moving averages, cloud our 
ability to pinpoint the reputational ramifications precisely. 

To provide further attribution for the decline, we con- 
ducted additional text analysis on the content of individual 
reviews. The results are summarized in Table|6j which cate- 
gorizes the user-contributed reviews according to occurrence 
of the keywords "Groupon" and "coupon". Reviews mention- 
ing either keyword are associated with star ratings that are 
10% lower on average than reviews that do not, while the 
very small fraction of reviews mentioning both keywords are 
more than 20% lower on average. 

Ultimately, the economic ramifications of reputational ef- 
fects due to running a daily deal remain uncertain. The 
positive impact of quickly reaching a broad, new audience is 
precisely in line with the daily deals sales pitch, and is borne 
out by the surge in reviews that we witness in our dataset. 
However, the lower-on-average rating scores in those reviews 
mentioning Groupon provides a cautionary note: this could 
indicate that a more critical audience is being reached, or 
that the fit between the merchant and these new customers 
is more tenuous than with existing customers. 

5. CONCLUSION AND FUTURE WORK 

Our examination of daily deal sites, and particularly Groupon, 
has used data-driven analysis to investigate relationships be- 
tween deal attributes and deal size beyond simple measures 
such as the offer price. Indeed, the scope of our investiga- 
tion goes well outside of deal sites, to consider Groupon's 
relationship with the larger electronic commerce ecosystem, 
including Facebook and Yelp. We believe we expose signifi- 



Coupon mentioned 

Yes No 

Groupon Yes 2.98 (354) 3.36 (4,315) 
mentioned No 3.36 (1,166) 3.71 (50,213) 



Table 6: Yelp reviews broken down by mentions of the key- 
words "Groupon" and "Coupon". The average rating as well 
as the number of reviews (in parentheses) are shown. 

cant complexity in understanding behavior in these systems. 
In particular, predicting deal sizes in settings where price, a 
multiplicity of other deal parameters, as well as the potential 
for social cascades to affect the outcome, provides a clear, 
albeit difficult, challenge. We also suggest the mining of 
publicly accessible Internet data, such as content on review 
sites, to benchmark the success of deal sites for merchants, 
in terms of the effect on their reputation. Expanding this 
approach to other data sources (such as Twitter or blogs) 
could yield further insights. 

While our focus here has been on data analysis, we believe 
our work opens the door to several significant questions in 
both modeling and optimizing deal sites and similar elec- 
tronic commerce systems. 
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APPENDIX 

A. DETAILED REGRESSION RESULTS 
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Table 7: The regression model of deal sizes on their various features. 
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(a) The regression model of deal sizes on their fea- 
tures with the inclusion of early morning sales as an 
independent variable. 



(b) The regression model of deal sizes on their features 
with the inclusion of Yelp rating as an independent 
variable. 



Table 8: Two alternative regression models for Groupon deal sizes. 
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Table 9: The regression model of deal sizes on their features with the inclusion of Facebook activity as an independent variable. 



